29 research outputs found
Context-Aware Embeddings for Automatic Art Analysis
Automatic art analysis aims to classify and retrieve artistic representations
from a collection of images by using computer vision and machine learning
techniques. In this work, we propose to enhance visual representations from
neural networks with contextual artistic information. Whereas visual
representations are able to capture information about the content and the style
of an artwork, our proposed context-aware embeddings additionally encode
relationships between different artistic attributes, such as author, school, or
historical period. We design two different approaches for using context in
automatic art analysis. In the first one, contextual data is obtained through a
multi-task learning model, in which several attributes are trained together to
find visual relationships between elements. In the second approach, context is
obtained through an art-specific knowledge graph, which encodes relationships
between artistic attributes. An exhaustive evaluation of both of our models in
several art analysis problems, such as author identification, type
classification, or cross-modal retrieval, show that performance is improved by
up to 7.3% in art classification and 37.24% in retrieval when context-aware
embeddings are used
EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions [Technical Report]
We introduce EQUI-VOCAL: a new system that automatically synthesizes queries
over videos from limited user interactions. The user only provides a handful of
positive and negative examples of what they are looking for. EQUI-VOCAL
utilizes these initial examples and additional ones collected through active
learning to efficiently synthesize complex user queries. Our approach enables
users to find events without database expertise, with limited labeling effort,
and without declarative specifications or sketches. Core to EQUI-VOCAL's design
is the use of spatio-temporal scene graphs in its data model and query language
and a novel query synthesis approach that works on large and noisy video data.
Our system outperforms two baseline systems -- in terms of F1 score, synthesis
time, and robustness to noise -- and can flexibly synthesize complex queries
that the baselines do not support.Comment: This is an extended technical report for the following paper: "Enhao
Zhang, Maureen Daum, Dong He, Brandon Haynes, Ranjay Krishna, and Magdalena
Balazinska. EQUI-VOCAL: Synthesizing Queries for Compositional Video Events
from Limited User Interactions. PVLDB, 16(11): 2714-2727, 2023.
doi:10.14778/3611479.3611482
A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
Microtask crowdsourcing is increasingly critical to the creation of extremely
large datasets. As a result, crowd workers spend weeks or months repeating the
exact same tasks, making it necessary to understand their behavior over these
long periods of time. We utilize three large, longitudinal datasets of nine
million annotations collected from Amazon Mechanical Turk to examine claims
that workers fatigue or satisfice over these long periods, producing lower
quality work. We find that, contrary to these claims, workers are extremely
stable in their quality over the entire period. To understand whether workers
set their quality based on the task's requirements for acceptance, we then
perform an experiment where we vary the required quality for a large
crowdsourcing task. Workers did not adjust their quality based on the
acceptance threshold: workers who were above the threshold continued working at
their usual quality level, and workers below the threshold self-selected
themselves out of the task. Capitalizing on this consistency, we demonstrate
that it is possible to predict workers' long-term quality using just a glimpse
of their quality on the first five tasks.Comment: 10 pages, 11 figures, accepted CSCW 201